249 research outputs found
Génération modulaire de grammaires formelles
The work presented in this thesis aim at facilitating the development of resources for natural language processing. Resources of this type take different forms, because of the existence of several levels of linguistic description (syntax, morphology, semantics, . . . ) and of several formalisms proposed for the description of natural languages at each one of these levels. The formalisms featuring different types of structures, a unique description language is not enough: it is necessary to create a domain specific language (or DSL) for every formalism, and to implement a new tool which uses this language, which is a long a complex task. For this reason, we propose in this thesis a method to assemble in a modular way development frameworks specific to tasks of linguistic resource generation. The frameworks assembled thanks to our method are based on the fundamental concepts of the XMG (eXtensible MetaGrammar) approach, allowing the generation of tree based grammars. The method is based on the assembling of a description language from reusable bricks, and according to a unique specification file. The totality of the processing chain for the DSL is automatically assembled thanks to the same specification. In a first time, we validated this approach by recreating the XMG tool from elementary bricks. Some collaborations with linguists also brought us to assemble compilers allowing the description of morphology and semantics.Les travaux présentés dans cette thèse visent à faciliter le développement de ressources pour le traitement automatique des langues. Les ressources de ce type prennent des formes très diverses, en raison de l’existence de différents niveaux d’étude de la langue (syntaxe, morphologie, sémantique,. . . ) et de différents formalismes proposés pour la description des langues à chacun de ces niveaux. Les formalismes faisant intervenir différents types de structures, un unique langage de description n’est pas suffisant : il est nécessaire pour chaque formalisme de créer un langage dédié (ou DSL), et d’implémenter un nouvel outil utilisant ce langage, ce qui est une tâche longue et complexe. Pour cette raison, nous proposons dans cette thèse une méthode pour assembler modulairement, et adapter, des cadres de développement spécifiques à des tâches de génération de ressources langagières. Les cadres de développement créés sont construits autour des concepts fondamentaux de l’approche XMG (eXtensible MetaGrammar), à savoir disposer d’un langage de description permettant la définition modulaire d’abstractions sur des structures linguistiques, ainsi que leur combinaison non-déterministe (c’est à dire au moyen des opérateurs logiques de conjonction et disjonction). La méthode se base sur l’assemblage d’un langage de description à partir de briques réutilisables, et d’après un fichier unique de spécification. L’intégralité de la chaîne de traitement pour le DSL ainsi défini est assemblée automatiquement d’après cette même spécification. Nous avons dans un premier temps validé cette approche en recréant l’outil XMG à partir de briques élémentaires. Des collaborations avec des linguistes nous ont également amené à assembler des compilateurs permettant la description de la morphologie de l’Ikota (langue bantoue) et de la sémantique (au moyen de la théorie des frames)
Describing SĂŁo Tomense Using a Tree-Adjoining Meta-Grammar
Poster sessionInternational audienceIn this paper, we show how the interactions between the tense, aspect and mood preverbal markers in São Tomense can be formally and concisely described at an abstract level, using the concept of projection. More precisely, we show how to encode the different valid orders of preverbal markers in an abstract description of a Tree-Adjoining Grammar of São Tomense. This description is written using the XMG meta-grammar language (Crabbé and Duchier, 2004)
Décrire la morphologie des verbes en ikota au moyen d'une métagrammaire
Association pour le Traitement Automatique des Langues. This article has been published in the Proceedings of the JEP-TALN-RECITAL 2012 conference. Available on-line at https://www.aclweb.org/anthology/W/W12/W12-1309.pdfNational audienceDans cet article, nous montrons comment le concept des métagrammaires introduit initialement par Candito (1996) pour la conception de grammaires d'arbres adjoints décrivant la syntaxe du français et de l'italien, peut être appliquée à la description de la morphologie de l'ikota, une langue bantoue parlé au Gabon. Ici, nous utilisons l'expressivité du formalisme XMG (eXtensible MetaGrammar) pour décrire les variations morphologiques des verbes en ikota. Cette spécification XMG capture les généralisations entre ces variations. Afin de produire un lexique de formes fléchies, il est possible de compiler la spécification XMG, et de sauvegarder le résultat dans un fichier XML, ce qui permet sa réutilisation dans des applications dédiées
The Origins of a Rich Absorption Line Complex in a Quasar at Redshift 3.45
We discuss the nature and origin of a rich complex of narrow absorption lines
in the quasar J102325.31+514251.0 at redshift 3.447. We measure nine C
IV(\lambda1548,1551) absorption line systems with velocities from -1400 to
-6200 km/s, and full widths at half minimum ranging from 16 to 350 km/s. We
also detect other absorption lines in these systems, including H I, C III, N V,
O VI, and Si IV. Lower ionisation lines are not present, indicating a generally
high degree of ionisation in all nine systems. The total hydrogen column
densities range from <=10^{17.2} to 10^{19.1}cm^{-2}. We examine several
diagnostics to estimate more directly the location and origin of each absorber.
Four of the systems can be attributed to a quasar-driven outflow based on line
profiles that are smooth and broad compared to thermal line widths. Several
systems also have other indicators of a quasar outflow origin, including
partial covering. Altogether there is direct evidence for 6 of the 9 systems
forming in a quasar outflow. Consistent with a near-quasar origin, eight of the
systems have metallicity values or lower limits in the range Z >= 1-8 Z_{sun}.
The lowest velocity system, which has an ambiguous location, also has the
lowest metallicity, Z <= 0.3 Z_{sun}, and might form in a non-outflow
environment farther from the quasar. Overall, however, this complex of narrow
absorption lines can be identified with a highly structured, multi-component
outflow from the quasar. The high metallicities are similar to those derived
for other quasars at similar redshifts and luminosities, and are consistent
with evolution scenarios wherein quasars appear after the main episodes of star
formation and metal enrichment in the host galaxies.Comment: 16 pages, 12 figures, Accepted to MNRAS, July 201
Plasmacytoid Dendritic Cell Infection and Sensing Capacity during Pathogenic and Nonpathogenic Simian Immunodeficiency Virus Infection.
International audienceHuman immunodeficiency virus (HIV) in humans and simian immunodeficiency virus (SIV) in macaques (MAC) lead to chronic inflammation and AIDS. Natural hosts, such as African green monkeys (AGM) and sooty mangabeys (SM), are protected against SIV-induced chronic inflammation and AIDS. Here, we report that AGM plasmacytoid dendritic cells (pDC) express extremely low levels of CD4, unlike MAC and human pDC. Despite this, AGM pDC efficiently sensed SIVagm, but not heterologous HIV/SIV isolates, indicating a virus-host adaptation. Moreover, both AGM and SM pDC were found to be, in contrast to MAC pDC, predominantly negative for CCR5. Despite such limited CD4 and CCR5 expression, lymphoid tissue pDC were infected to a degree similar to that seen with CD4(+) T cells in both MAC and AGM. Altogether, our finding of efficient pDC infection by SIV in vivo identifies pDC as a potential viral reservoir in lymphoid tissues. We discovered low expression of CD4 on AGM pDC, which did not preclude efficient sensing of host-adapted viruses. Therefore, pDC infection and efficient sensing are not prerequisites for chronic inflammation. The high level of pDC infection by SIVagm suggests that if CCR5 paucity on immune cells is important for nonpathogenesis of natural hosts, it is possibly not due to its role as a coreceptor. The ability of certain key immune cell subsets to resist infection might contribute to the asymptomatic nature of simian immunodeficiency virus (SIV) infection in its natural hosts, such as African green monkeys (AGM) and sooty mangabeys (SM). This relative resistance to infection has been correlated with reduced expression of CD4 and/or CCR5. We show that plasmacytoid dendritic cells (pDC) of natural hosts display reduced CD4 and/or CCR5 expression, unlike macaque pDC. Surprisingly, this did not protect AGM pDC, as infection levels were similar to those found in MAC pDC. Furthermore, we show that AGM pDC did not consistently produce type I interferon (IFN-I) upon heterologous SIVmac/HIV type 1 (HIV-1) encounter, while they sensed autologous SIVagm isolates. Pseudotyping SIVmac/HIV-1 overcame this deficiency, suggesting that reduced uptake of heterologous viral strains underlays this lack of sensing. The distinct IFN-I responses depending on host species and HIV/SIV isolates reveal the host/virus species specificity of pDC sensing
The Eighth Data Release of the Sloan Digital Sky Survey: First Data from SDSS-III
The Sloan Digital Sky Survey (SDSS) started a new phase in August 2008, with
new instrumentation and new surveys focused on Galactic structure and chemical
evolution, measurements of the baryon oscillation feature in the clustering of
galaxies and the quasar Ly alpha forest, and a radial velocity search for
planets around ~8000 stars. This paper describes the first data release of
SDSS-III (and the eighth counting from the beginning of the SDSS). The release
includes five-band imaging of roughly 5200 deg^2 in the Southern Galactic Cap,
bringing the total footprint of the SDSS imaging to 14,555 deg^2, or over a
third of the Celestial Sphere. All the imaging data have been reprocessed with
an improved sky-subtraction algorithm and a final, self-consistent photometric
recalibration and flat-field determination. This release also includes all data
from the second phase of the Sloan Extension for Galactic Understanding and
Evolution (SEGUE-2), consisting of spectroscopy of approximately 118,000 stars
at both high and low Galactic latitudes. All the more than half a million
stellar spectra obtained with the SDSS spectrograph have been reprocessed
through an improved stellar parameters pipeline, which has better determination
of metallicity for high metallicity stars.Comment: Astrophysical Journal Supplements, in press (minor updates from
submitted version
Representation and parsing of multiword expressions: Current trends
This book consists of contributions related to the definition, representation and parsing of MWEs. These reflect current trends in the representation and processing of MWEs. They cover various categories of MWEs such as verbal, adverbial and nominal MWEs, various linguistic frameworks (e.g. tree-based and unification-based grammars), various languages including English, French, Modern Greek, Hebrew, Norwegian), and various applications (namely MWE detection, parsing, automatic translation) using both symbolic and statistical approaches
- …